The heatmap of the word matrix for SARS-CoV and SARS-CoV-2 genome

ata (3-mer word frequency matrix).

(a) (b)

a) The PCA map of the 3-mer word matrix for the SARS-CoV and SARS-CoV-

equences. (b) The ROC curves of the k-mer machine for discriminating between

CoV and SARS-CoV-2 genome sequences.

e 7.17(a) shows the PCA analysis for the word matrix. Again it

perfect discrimination power between two groups of SARS

sequences. The discrimination power of the k-mers word

y library has been already researched in the literature [Ghandi,

16; Fletez-Brant, et al., 2016; Lee, 2016; Beer, 2017; Shrikumar,

19]. Such a classifier is referred to as a k-mer machine. Three

tion algorithms were used, hence three k-mer machine models